NEXUS is a production-grade, full-text and semantic search engine built from scratch, implementing advanced data structures and distributed systems concepts. It focuses on probabilistic optimization, sub-millisecond latency, and hybrid AI-powered search. The project demonstrates core technologies like LSM Trees, Bloom Filters, HNSW Graphs, and W-TinyLFU caches, integrated into a high-performance pipeline. It also includes a LeetCode algorithm library with implementations of classic interview patterns and provides insights into distributed crawling and persistent storage.
usersDF.write.format("orc")
.option("orc.bloom.filter.columns", "favorite_color")
.option("orc.dictionary.key.threshold", "1.0")
.option("orc.column.encoding.direct", "name")
.save("users_with_options.orc")
Find full example code at "examples/src/main/scala/org/apache/spark/examples/sql/SQLDataSourceExample.scala" in the Spark repo